fix(box): Linux/KVM runtime fixes — bridge/Compose as root, overlay fs commands, snapshot restore, image integrity#5
Merged
Conversation
added 4 commits
May 30, 2026 23:00
When a3s-box runs as root, passt drops privileges to `nobody` and then
cannot create its control socket under the box's 0700 `~/.a3s` home, so the
runtime times out ("passt socket did not appear within 5 seconds") and every
bridge-mode `run --network` and all `compose up` boots fail.
Place the passt socket in the world-traversable runtime socket directory
(alongside the exec/PTY sockets) and widen that directory's permissions under
cfg(unix) so the dropped user can bind it. Also capture passt's stderr instead
of discarding it to /dev/null and detect immediate child exit, so a real spawn
failure surfaces instead of a misleading 5s timeout.
Verified on a real Linux/KVM host: bridge boxes get an eth0 IP and `compose up`
starts services.
diff/export/commit hard-coded `<box_dir>/rootfs`, which only exists for the plain/copy provider. The default Linux overlay provider materializes the rootfs at `<box_dir>/merged`, so all three commands failed with "Rootfs not found". Add a shared `resolve_box_rootfs()` helper that prefers the overlay `merged` directory and falls back to `rootfs`, and use it in all three commands. The diff baseline snapshot is taken against the same resolved rootfs. Verified: export tars the full rootfs, commit produces a runnable image, and diff reports added files correctly.
…tion snapshot create captured `<box_dir>/rootfs` (empty under overlay) and restore booted the restored box from the image, losing all filesystem changes. - snapshot create now captures the resolved rootfs (overlay `merged`). - snapshot restore writes a `.snapshot-rootfs` marker; prepare_layout boots directly from `<box_dir>/rootfs` when that marker is present, so the restored box reflects the snapshot. The marker gate guarantees normal boxes (which never have `rootfs` under overlay) are unaffected. Also harden guest-init selection (layout.rs): is_linux_elf rejects dynamically linked ELFs (no PT_INTERP) so a glibc host build can never be chosen as PID 1, and candidates rank musl-static builds first on all platforms. Verified: restored box contains files written before the snapshot.
…subnet - oci/layers: apply OCI whiteout semantics during extraction (.wh.<name> and .wh..wh..opq), so files deleted in upper layers don't reappear and markers are not written into the rootfs. Normal entries delegate to unpack_in to preserve symlink/perm/mtime fidelity. (+ unit tests) - oci/registry: verify the SHA-256 of each pulled config/layer blob against the manifest before storing it content-addressed. - core/network: Ipam::new rejects prefix_len 0 (and uses checked arithmetic for the gateway) to fix a shift-overflow panic / garbage subnet. (+ unit tests)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a set of correctness, security, and lifecycle bugs found by exercising the
full command surface on a real Linux x86_64 + KVM host (libkrun). Several of
these completely broke core features when a3s-box runs as root (the common
server/KVM case). All changes are scoped to the Linux/overlay path and are
cross-platform safe (platform-specific code is
cfg-gated; verified to compileclean on macOS arm64).
Fixes (4 commits)
nobody, then could not create its socket under the box's0700~/.a3shome → every
run --networkandcompose upfailed with "passt socket didnot appear". Socket now lives in the world-traversable runtime socket dir
(next to exec/PTY sockets) with appropriate permissions; passt stderr is
captured and early exit is detected (no more misleading 5s timeout).
<box_dir>/rootfs, which only exists for the plain provider; the defaultLinux overlay provider uses
<box_dir>/merged. Addedresolve_box_rootfs()(prefers
merged, falls back torootfs).rootfs; restore writes a marker and the runtime boots the restored box from
the captured rootfs instead of rebuilding from the image. Also hardens
guest-init selection: reject dynamically-linked ELFs (a glibc host build can
never be picked as PID 1) and rank musl-static builds first.
extraction (deleted files no longer reappear); verify the SHA-256 of pulled
config/layer blobs before storing; reject
/0subnets inIpam(was ashift-overflow panic).
Verification (real KVM host)
core_smoke: 14/14 (was 9/14 before these fixes)command_coverage: 6/6 ·host_smokeVM matrix + Compose: pass/0& overflow — passcargo clippy -D warningsclean; macOS arm64cargo checkclean